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MULTISCALE LOCAL CHANGE POINT DETECTION 
WITH APPLICATIONS TO VALUE-AT-RISK 

By Vladimir Spokoiny 

Weierstrass-Institute and Humboldt University Berlin 

This paper offers a new approach to modehng and forecasting 
of nonstationary time series with apphcations to volatihty modeling 
for financial data. The approach is based on the assumption of local 
homogeneity; for every time point, there exists a historical interval of 
homogeneity, in which the volatility parameter can be well approxi- 
mated by a constant. The proposed procedure recovers this interval 
from the data using the local change point (LCP) analysis. After- 
ward, the estimate of the volatility can be simply obtained by local 
averaging. The approach carefully addresses the question of choosing 
the tuning parameters of the procedure using the so-called "propa- 
gation" condition. The main result claims a new "oracle" inequality 
in terms of the modeling bias which measures the quality of the local 
constant approximation. This result yields the optimal rate of esti- 
mation for smooth and piecewise constant volatility functions. Then, 
the new procedure is applied to some data sets and a comparison 
with a standard GARCH model is also provided. Finally, we discuss 
applications of the new method to the Value at Risk problem. The 
numerical results demonstrate a very reasonable performance of the 
new method. 

1. Introduction. This paper presents a novel approach to modeling of 
nonstationary time series based on the local parametric assumption, which 
means that the underlying process having an arbitrary nonstationary struc- 
ture can, however, be well approximated by a simple time-homogeneous 
parametric time series within some time interval. 

Since the seminal papers of Engle (1982) and Bollerslev (1986), modeling 
the dynamic features of the variance of financial time series has become one 
of the most active fields of research in econometrics. New models, different 
applications and extensions have been proposed, as can be seen by consult- 
ing, for example, the monographs of Engle (1995) and of Gourieroux (1997). 



Received March 2008. 

AMS 2000 subject classifications. Primary 62M10, 62G05; secondary 62P20, 62G10. 
Key words and phrases. Volatility model, adaptive estimation, local homogeneity. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2009, Vol. 37, No. 3, 1405-1436. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



V. SPOKOINY 



The main idea behind this strain of research is that the volatihty clustering 
effect that is displayed by stock or exchange rate returns can be modeled 
globally by a stationary process. This approach is somehow restrictive and 
does not fit some characteristics of the data, in particular the fact that the 
volatility process appears to be "almost integrated," as it can be seen by 
usual estimation results and by the very slow decay of the autocorrelations 
of squared returns. Other global parametric approaches have been proposed 
by Engle and Bollerslev (1986) and Baillie, Bollerslev and Mikkelsen (1996) 
in order to include these features in the model. Furthermore, continuous 
time models, and in particular diffusions and jump diffusions, have also 
been considered [see, e.g., Andersen, Benzoni and Lund (2002) and 
Duffie, Pan and Singleton (2000)]. 

However, Mikosch and Starica (2000b) showed that long memory effects 
of financial time series can be artificially generated by structural breaks 
in the parameters. This motivates another modeling approach, which bor- 
rows its philosophy mainly from the nonparametric statistics. The main 
idea consists of using a simple parametric model for describing the condi- 
tional distribution of the returns but allowing the parameters of this dis- 
tribution to be time dependent. The basic assumption of local time ho- 
mogeneity is that the variability in returns is much higher than the vari- 
ability in the underlying parameter which allows for estimating this pa- 
rameter from the most recent historical data. Some examples of this ap- 
proach can be found in Fan and Gu (2003), Dahlhaus and Rao (2006) and 
Cheng, Fan and Spokoiny (2003). Furthermore, Mercurio and Spokoiny (2004) 
proposed a new local adaptive volatility estimation (LAVE) of the unknown 
volatility from the conditionally heteroskedastic returns. The method is 
based on pointwise data-driven selection of the interval of homogeneity for 
every time point. The numerical results demonstrate a reasonable perfor- 
mance of the new method. In particular, it usually outperforms the stan- 
dard GARCH(1, 1) approach. Hardle, Herwartz and Spokoiny (2003) extend 
this method to estimating the volatility matrix of the multiple returns, and 
[Mercurio and Torricelli (2003)] apply the same idea in the context of a re- 
gression problem. 

The aim of the present paper is to develop another approach which, 
however, applies a similar idea of pointwise adaptive choice of the inter- 
val of homogeneity. One essential difference between the LAVE approach 
from Mercurio and Spokoiny (2004) and the new procedure is in the way 
of testing the homogeneity of the interval candidate. In this paper, we fol- 
low [Grama and Spokoiny (2008)] and systematically apply the approach 
based on the local multiscale change point analysis. This means that for 
every historical time point, we test on a structural change at this point 
for the corresponding scale. The largest interval not containing any change 
is used for estimation of the parameters of the return distribution. This 
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approach has a number of important advantages of being easy to imple- 
ment and very sensitive to the structural changes in the return process. 
We carefully address the question of selecting the tuning parameters of 
the procedure, which is extremely important for practical applications. The 
proposed "propagation" approach suggests to tune the parameters under 
the simple time-homogeneous situation to provide the prescribed perfor- 
mance of the procedure. This way is justified by the theoretical results from 
Section 4, which claim the "oracle" properties of the resulting estimate in 
the general situation. Another important feature of the proposed proce- 
dure is that it can be easily extended to multiple volatility modeling [cf. 
Hardle, Herwartz and Spokoiny (2003)]. 

The change point detection problem for financial time series was con- 
sidered in Mikosch and Starica (2000a), but they focused on asymptotical 
properties of the test if only one change point is present. Kitagawa (1987) 
applied non-Gaussian random walk modeling with heavy tails as the prior 
for the piecewise constant mean for one-step-ahead prediction of nonstation- 
ary time series. However, the mentioned modeling approaches require some 
essential amount of prior information about the frequency of change points 
and their size. The new approach proposed in this article does not assume 
smooth or piecewise constant structure of the underlying process and does 
not require any prior information. The procedure proposed below in Sec- 
tion 3 focuses on adaptive choice of the interval of homogeneity that allows 
to proceed in a unified way with smoothly varying coefficient models and 
change point models. 

The proposed LCP approach is quite general and can be applied to 
many different problems. Grama and Spokoiny (2008) studied the problem 
of Pareto tail estimation, Giacomini, Hardle and Spokoiny (2007) consid- 
ered time varying copulae estimation, Gizek, Hardle and Spokoiny (2007) 
applied it to compare the performance of global and time varying ARCH 
and GARCH specifications. A comprehensive study of the general LCP pro- 
cedure is to be given in the forthcoming monograph [Spokoiny (2008)] . 

The theoretical study given in Sections 2 and 4 focuses on two impor- 
tant features of the proposed procedure: stability in the homogeneous sit- 
uation and sensitivity to spontaneous changes of the model parameter (s). 
We particularly show that the procedure provides the optimal sensitivity to 
changes for the prescribed "false alarm" probability. Note that the classical 
asymptotic methods for stationary time series do not apply in the considered 
nonstationary situation with possibly small samples required to develop new 
approaches and tools. Our way of analysis is based on the so-called "small 
modeling bias" condition, which generalizes the famous bias- variance trade- 
off. The main result in Theorem 4.7 claims that the procedure delivers the 
estimation accuracy corresponding to the optimal choice of the historical 
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interval. It is worth mentioning that the result applies to every volatility 
process, including piecewise constant, smooth varying or mixed structures. 

The paper is organized as follows. Section 2 describes the local paramet- 
ric approach for the volatility modeling and presents some results about the 
accuracy of the local constant volatility estimation. Section 3 introduces the 
adaptive modeling procedure. Theoretical properties of the procedure are 
discussed in the general situation and for two particular cases: a change point 
model with piecewise constant volatility and a volatility function smoothly 
varying in time in Section 4. Section 5 illustrates the performances of the 
new methodology by means of some simulated examples and real data sets. 
Note that the same procedure with the default setting is applied for all the 
examples and applications, and it precisely follows the theoretical descrip- 
tion. Section 5.4 discusses applications of the new method to the Value at 
Risk problem. 

2. Volatility modeling. Local parametric approach. Let St be an ob- 
served asset process in discrete time, t = 1,2, . . . , while Rt defines the corre- 
sponding return process: Rt = log{St/ St~i). We model this process via the 
conditional heteroskedasticity assumption: 



where et, t > 1 is a sequence of independent standard Gaussian random 
variables and at is the volatility process which is in general a predictable 
random process, that is, at is measurable with respect to Tt-i with Tt-i = 
a{Ri, . . . ,Rt-i) (cr-field generated by the first t — 1 observations). 

In this paper [similar to Mercurio and Spokoiny (2004)] we focus on the 
problem of filtering the parameter f{t) = at from the past observations 
Ri,. . . , Rt-i- This problem naturally arises as an important building block 
for many tasks of financial engineering like Value at Risk or Portfolio Opti- 
mization. 

We start the theoretical analysis from the simplest homogeneous case, 
applying the classical maximum likelihood approach. In particular, we show 
that the corresponding MLE has nice nonasymptotic properties. Later, we 
indicate how one can extend these nice results to the general nonhomoge- 
neous situation. 

2.1. Parametric modeling. A time-homogeneous [time-homo skedastic) 
model means that at is a constant. The process St is then a Geometric 
Brownian motion observed at discrete time moments. For the homogeneous 
model at = with t £ I, the squared returns Yt = Rt follow the equation 
Yt = Bel, and the parameter 9 can be estimated using the maximum likeli- 
hood method 




Rt = crtEt 
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where i{y,6) = — (1/2) log(27r0) — y/{29) is the log-density of the normal 
distribution with the parameters (0,0). This yields 

(2.2) Li{e) = -{Ni /2)\og{2T,e) - Si/{2e), 

where Nj denotes the number of time points in / and Sj = J2tei^t- 

The volatility model is a particular case of an exponential family, so that 
a closed form representation for the maximum likelihood estimate Oj and 
for the corresponding fitted log-likelihood Lj{6i) are available [see Polzehl 
and Spokoiny (2006) for more details]. 

Theorem 2.1. For every interval I, 

ej = Si/Nj = Nf'Y.^t. 

Moreover, for every 9 > 0, the fitted likelihood ratio Lj{9, 9) = maxg/ Lj{9', 9), 
with Li{9', 9) = Lj{9') - Li{9), satisfies 

(2.3) Lii9i,9) = NilCi9i,9), 
where 

IC{9', 9) = -{\og{9'/9) + 1 - 9'/9}/2 

is the Kullback-Leibler information for the two normal distributions with 
variances 9' and 9. 

Proof. Both results follow by simple algebra from (2.2). □ 

Remark 2.1. The assumption of normality for the innovations ej is of- 
ten criticized in the financial literature. Our empirical examples in Section 
5.2 below also indicate that the tails of estimated innovations are heavier 
than the normality would imply. However, the estimate 9i remains meaning- 
ful even for the nonnormal innovations, it is just a quasi- likelihood approach. 

Theorem 2.2 [Polzehl and Spokoiny (2006)]. Let f{t) = 9* for tel. If 
the innovations et are i.i.d. standard normal, then, for any 3 > 0, 

^e'{Li{9i,9*) > i) = ^e'\Ni]C{9i,9*) > i) <2e-K 

The result can be extended to the case of nonnormal innovations £t under 
the condition of bounded exponential moments for e^. The general case can 
be reduced to this one by some data transformation [see Chen and Spokoiny 
(2007) for details]. 

The Kullback-Leibler divergence K. fulfills }C{9',9*) < r\9' - 9*1"^ for any 
point 9' in a neighborhood of 9*, where /* is the maximum of the Fisher 
information over this neighborhood. Therefore, the result of Theorem 2.2 

— 1/2 

guarantees that \9j — 9*\ < CNj with a high probability. Theorem 2.2 
can be used for constructing the confidence intervals for the parameter 9* . 
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Theorem 2.3. If satisfies 2e~^" < a, then 

£a = {0:NjJC{ei,e)<^a} 
is an a-confidence set for the parameter 6*. 

Theorem 2.2 claims that the esthnation loss measured by IC{9j,9*) is 
with high probability bounded by ]i/Ni, provided that 3 is sufficiently large. 
Similarly, one can establish a risk bound for a power loss function. 

Theorem 2.4. Let Rt be i.i.d. from M{0,9*). Then, for any r > and 
any interval I, 

Ee*\Li{ei,9*)\^ = Ee*\NjlC{9i,9*)\' <Xr, 

where Xr = 2r /^>o3''~^e~^(i3 = 2rr(r). Moreover, there exists a constant Cr 
depending on r only such that for any 3 > 1 and any other interval I 

¥.e*\Lj{9x,9*)\'l{Lj{9j,9*)>i) < Crfe-\ 



Proof. Proof by Theorem 2.2 
E,.|L,(^~/,r)r<- / i'd'^e*{L{9i,9*)>i) 

h>0 



<r f i-'¥g,{Li{9j,9*)>i)di<2r f f-'e-'d^ 

Ji>0 J?:>0 



'i>0 J}>0 

and the first assertion is fulfilled. Similarly, one can show that 

Ee*\Lji9j,9*)\n{Li{9i,9*)>i) < ae"^ 
where Cr depends on r only. It remains to note that 
\Lx{9j,9*)\^l{Lj{9j,9*)>i) 

< \Li{9j,9*)riiLi{9i, 9*) > 3) + \Lii9j, 9*)Yl{Lj{9j, 9*) > 3). □ 

2.2. Risk of estimation in nonparametric situation. "Small modeling bias" 
condition. This section extends the bound of Theorem 2.4 to the nonpara- 
metric model i?j = f{t)e^ when the function /(•) is not any longer constant 
even in a vicinity of the reference point t^. We, however, suppose that the 
function /(•) can be well approximated by a constant 9 at all points t£ I. 

Let Zg = dF/dFg be the likelihood ratio of the underlying measure P 
with regard to the parametric measure ¥g corresponding to the constant 
parameter /(•) = 9. Then, 

^ pjYtJit)) 
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If we restrict our analysis to an interval / and denote by P/, respectively, 
P/^gi the measure corresponding to the observations Yt for t I, then in a 
similar way 

log Zi^e := log -— = 2^ log — , . ■ 

To measure the quality of the approximation of the underlying measure P/ 
by the parametric measure P/,6»5 define 

(2.4) Aiie) = Y^icifit),e), 

where IC{f{t),6) means the Kullback-Leibler distance between two param- 
eter values f{t) and 9. 

Let q{9i,0) be a loss function for an estimate 9j constructed from the 
observations Yt, for t £ I. Define also the corresponding risk under the para- 
metric measure ¥g: 

n{9i,9)=Eeg{9i,9). 

The next result explains how the risk bounds can be translated from the 
parametric to the nonparametric situations. 

Theorem 2.5. Let, for some 9 £@ and some A > 0, 

(2.5) EA/(6i)<A. 

Then, it holds for any estimate 9 measurable with regard to Ti 
Elog(l + Q{9,9)/n{9, ^)) < A + 1. 

Proof. The proof is based on the following general result. 

Lemma 2.6. Let P and Pq he two measures such that the Kullback- 
Leibler divergence Elog((iP/c[Po) satisfies 

Elog{d¥/d¥o) < A < oo. 

Then, for any random variable with EqC < oo, 

Elog(l + C)< A + EoC- 



Proof. By simple algebra one can check that, for any fixed y, the max- 
imum of the function f{x) = xy — xlogx + x is attained at x = e^, leading 
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to the inequality xy < x log x — x + . Using this inequality and the repre- 
sentation Elog(l + C) = Eo{Zlog(l + C)} with Z = cflP/dPo, we obtain 

Elog(l + 0=Eo{Zlog(l + C)} 

<Eo(ZlogZ-Z)+Eo(l + C) 
= Eo(Z log Z) + EoC -EqZ + I. 

It remains to note that E,qZ = 1 and Eo(^ log Z) = E log Z. □ 

We now apply this lemma with (, = q{6,6)/TZ{9,9) and show that EqC = 
Egg{0,0)/n{e,e) = l. This yields 

Ee log = E log = E 5^ log ^^^^/P 



tei 

and the result follows. □ 

This result implies that the bound for the risk of estimation K\Lj{0j ,6)\'^ = 
\NjE,lC{9j,9)\'^ under the parametric hypothesis can be extended to the non- 
parametric situation provided that the value A/(^) is sufficiently small. For 
r > 0, define ^(^/, 6) = \NilC{ei,9)Y . By Theorem 2.4, 7^(^/, 0) = Eee(^7, 9) < 

Corollary 2.7. Let (2.5) hold for some 9. For any r > 0, 
Elog(l + \NiK{9i,9)Y/xr) < A + 1. 

This result means that in the nonparametric situation under the condition 

(2.5) with some fixed A, the losses \NiK,{9i,9)Y are stochastically bounded. 
Note that this result applies even if A is large, however, the bound is pro- 
portional to e^"*"^ and grows exponentially with A. 

2.3. "Small modeling bias" condition and rate of estimation. This sec- 
tion briefly comments on relations between the results of Section 2.2 and 
the usual rate results under smoothness conditions on the function /(•). 

Let n be the parameter meaning length of the largest considered historical 
interval. More precisely, we assume that the function /(•) is smooth in the 
sense that, for 9* = f{t^) and any t>t^ — n, 

(2.6) IC'/\f{t),9*)<{t'>-t)/n. 
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In view of the inequality IC{9,9') x \9/9' — Ip, this condition is equivalent 
to the usual Lipschitz property of the rescaled function f{t/n). This con- 
dition bounds the bias of approximating the underlying function /(t) by a 
constant /(i*) by (t* — t)/n. The variance of the estimate for / = 
is proportional to — t). The usual "bias- variance trade-off" means the 
relation "bias^ x variance," leading to {t^ — t)^ x tt?. 
Now note that (2.5) and (2.6) implies 

A/(0*) < A^l/n^. 

Therefore, the "small modeling bias" condition Aj{9) < A is essentially 
equivalent to "bias-variance trade-off." Moreover, combined with the result 
of Corollary 2.7, this condition leads to the following classical rate results. 

Theorem 2.8. Assume (2.6). Select I such that Nj = cr?l'^ , for some 
c> 0. Then, (2.5) holds with A = and, for any r > 0, 

\og{l + \NilC{9i,9)Yl^r)<c!' + l. 

This corresponds to the classical accuracy of nonparametric estimation 
for the Lipschitz functions [cf. Fan, Farmen and Gijbels (1998)]. 

3. Adaptive volatility estimation. The assumption of time homogeneity 
is too restrictive in practical applications and it does not allow to fit real data 
well. In this paper we consider an approach based on the local parametric 
assumption, which means that, for every time moment t^, there exists a 
historic time interval [t* — m,t*[ in which the volatility process at is nearly 
constant. Under such a modeling, the main intention is both to describe the 
interval of homogeneity and to estimate the corresponding value (Tt« . 

Our approach is based on the adaptive choice of the interval of homogene- 
ity for the fixed end point t^ . This choice is made by the local [multiscale) 
change point detection (LCP) algorithm described below. The procedure 
attempts to find this interval from the data by successive testing of the hy- 
pothesis of homogeneity. An interval-candidate is accepted if every point is 
negatively tested on a possible location of a change point. A change point 
test at a location t <t^ compares two different estimates of the parame- 
ter 9; one is computed from the most recent interval [r, t*] while the other 
one is obtained by using another interval \t' ,t\ before the possible jump. 
The procedure is multiscale in the sense that the choice of the other in- 
terval [t',r] and the critical value of the test depends on the distance of 
the testing point r from the reference point t*. More precisely, let a grow- 
ing sequence of numbers Ni < N2 < ■ ■ ■ < Nk be fixed. Each Ni^. means the 
scale parameter describing the length of the historical time interval screened 
at the step k. Define a family {X^, /c = 1, . . . , K} of nested intervals of the 
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form Xk = [t* — Nk,t^[ with the right edge at t*. The procedure starts from 
the smallest interval Xi by testing the hypothesis of homogeneity within Xi 
against a change point alternative. If the hypothesis is not rejected then we 
take the next larger interval X^ and test for a change point. We continue 
this way until we detect a change point or the largest considered interval X^ 
is accepted. If the hypothesis of homogeneity within some interval is re- 
jected and a change point is detected at a point f £Xk, then the estimated 
interval of homogeneity is defined as the latest accepted interval, that is, 
X = Xk-i = [f^ — iVfc_i, otherwise we take X = Xk- Finally, we define the 
estimate /(t*) = ^ of the volatility parameter f{t^) = Cjo as f{t^) = Of. The 
main ingredient of this procedure is the homogeneity test which is described 
in the next section. 



3.1. Test of homogeneity against a change point alternative. Let J be 
a tested interval which has to be checked on a possible change point. For 
carrying out the test, we also assume a larger testing interval /= to 
be fixed. The hypothesis of homogeneity for J means that the observations 
Rt follow the parametric model with the parameter 9 for J itself and for 
the larger interval /. This hypothesis leads to the parametric log-likelihood 
Lj{9) for the observations Rt S /. We want to test this hypothesis against 
a change point alternative that the parameter 9 spontaneously changes in 
some internal point r of the interval J. Every point t & J splits the interval 
/ = [t',t"[ onto two subintervals, I" = I'l = [T,t"[ and I' = I'^ = I \ I" = 
[t' ,t[ (see Figure 1). The change point alternative means that f{t) = 9" for 
t G /" and f{t) = 9' ^ for t G /" for some 9" ^ 9'. This corresponds to the 
log-likelihood Ljii{9") + Lji{9'). The likelihood ratio test statistic for the 
change point alternative with the change point location at the point r is of 
the form 

Ti^r = max{Lp,(^") + Lr{9')} - maxL^^) 

(3.1) 

= mmmax\Lr,(9",9) + Lr(9',9)}. 

e e",e' 

For the considered volatility model, this test statistic can be represented in 
a simple form given by the next lemma. 

Lemma 3.1. It holds for any interval I and point t & I, 
(3.2) Tj^r = Np,}C{9r,,9i) + Nj,IC{9p,9i) 

with /'= [t',r] and /" = [r,t"]. 

Proof. By (2.3), minimization in (3.1) with regard to 9 leads to the 
choice 9 = 9i. Similarly, maximization with regard to 9' and 9" leads to 
9' = 9ji and 9" = 9i", and the assertion follows. □ 
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J 



Fig. 1. Intervals involved in the change point test. 

The change point test for the interval J is defined as the maximum of the 
test statistics Ti^j- over t G J: 

(3.3) Tjj = maxT/^T- 

re J 

The change point test compares this statistic with the critical value 3 which 
may depend on the intervals I, J. The hypothesis of homogeneity is rejected 
if Tj^j > 3, in this case the estimate of the change point location is defined 
as f = argmaxT-£j^ T/^^. 

Remark 3.1. The change point alternative suggested above is only one 
possibility to test the homogeneity assumptions. One can apply many other 
tests, for example, omnibus tests against polynomials or trigonometric func- 
tions [see, e.g.. Hart (1998)]. Our choice is motivated by several reasons. 
First of all, it is simple to implement and does not require a special model 
estimation under alternative because the alternative reduces to the null hy- 
pothesis for two smaller intervals. Secondly, it has a natural interpretation 
and delivers additional information about the location of the change and 
the length of the interval of homogeneity. Finally, it was shown in Ingster 
(1986) [see also Horowitz and Spokoiny (2001)] that a test based on the local 
constant alternative is powerful against smooth alternatives as well. 

3.2. The multiscale procedure. This section describes the LCP proce- 
dure. The procedure is sequential and consists of K steps corresponding to 
the given growing sequence of numbers A'^o < A^i < • • • < Nk- This sequence 
determines the sequence of nested intervals Xq C Xi C • • • C Xk with the right 
edge at the point of estimation t*:Xfe = [t1,t'^[= [t^ — Nk,t'^[ (see Figure 2). 
This set of intervals leads to the set of estimates ^j^., /c = 0, 1, . . . , Ob- 
viously, Nx^. = Nk- For conciseness of notation, we also write 9^ in place of 

The proposed adaptive method chooses an index k of, equivalently, the 
estimate 0^ from this set. The procedure is sequential and it successively 
checked the intervals . . . ,1^, on change points. 

The interval Xq is always accepted and the procedure starts with k = 1. 
At every step k, every point of the interval J'k =1^ \ is tested as a 
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FlG. 2. The intervals Ik and Jt for the LCP procedure. 



potential change point due to the procedure from Section 3.1. The testing 
interval / = Ik+i is applied. is accepted if the previous interval Ik-i was 

accepted and the test statistic Tk *== Tj^^^^j-^ defined by (3.3) does not exceed 
the critical value 3^. The latter means that there is no change point detected 
within Jk. Equivalently, Ik is accepted if every point is negatively tested on 
a change point location. The event {Ik is rejected} means that Ti > 3/ for 
some I <k, and hence, a change point has been detected in the first k steps 
of the procedure at some point within Ik- For every k, we define an index 
Kk corresponding to the largest accepted interval after the first k steps, and 
Gk = is the corresponding estimate. The estimate 6k and 6k coincide if 
no change point is detected at the first k steps. The final estimate is defined 
as 6 = 6k and corresponds to the largest found interval of homogeneity. The 
formal definition reads as follows: 

K = maxj/c < iiT : T/ < 3/, / = 1, . . . , A;}, 9 = 9^. 

The way of choosing the critical value as well as the other parameters of 
the procedure, like the intervals Ik, is discussed in the next section. 

3.3. Choice of the parameters ik using "propagation" condition. The 
"critical value" 3^ defines the level of significance for the test statistics 
Tk = Tj^^j^. A proper choice of the parameters ik is crucial for the per- 
formance of the procedure. We propose in this section one general approach 
for selecting the 3fc's, which is similar to the bootstrap idea in the hypoth- 
esis testing problem. Indeed, the proposed procedure can be viewed as a 
multiple test with the scale dependent critical values. We select these values 
to provide the prescribed performance of the procedure in the parametric 
situation (under the null hypothesis). In the classical testing approach, the 
performance of the method is measured by the errors of the first and second 
kind, and the critical value is selected to ensure the prescribed test level 
which is the probability of rejecting the null under the null hypothesis. In 
the considered framework, the null hypothesis means a time-homogeneous 
model with a constant volatility 6* . We apply a slightly modified condition 
on the first kind error which suits better the considered estimation problem. 
Our primary goal is to select one estimate out of family 6k, rather than test- 
ing on a change point. In the homogeneous situation, our optimal choice, 
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corresponds to the largest interval Ik leading to the estimate Ok with the 
smallest variability in the considered family (see Theorem 2.4). A "false 
alarm" means that a nonexisting change point is detected, which leads to 
selecting an estimate with a larger variability than that of Ok ■ Our condition 
accounts not only for the frequency of false alarms but also at which k step 
a "false alarm" occurs. Before giving a precise formulation, we mention one 
important and helpful property of the volatility parametric model /(•) = 0*: 
for any intervals J C I, the distribution of the test statistic T/ j does not 
depend on the parameter value 0*. This is a simple corollary of the fact 
that volatility is a scale parameter of the corresponding parametric family. 
However, in view of its importance for our study we state it in a separate 
lemma. 

Lemma 3.2. Let the return Rt follow the parametric model with the 
constant volatility parameter 0* , that is, = 0*e^. Then, for any J C I 
and any t & J, the distribution of the test statistics Tj ^ and Tj^j under ¥g* 
is the same for all 0* > 0. 

Proof. It suffices to notice that for every interval / the estimate Oj can 
be represented under ¥g* as 

tei t£i 

and for each two intervals the Kullback-Leibler divergence IC{0i,0ji) 
is a function of the ratio 6j /Op . □ 

The result of Lemma 3.2 allows us to reduce the parametric null situation 
to the case of a simple null consisting of one point 0* , for example, 0* = 1. 
The corresponding distribution of the observation under this measure will 
be denoted by Pg* . 

For every step fc, we require that in the parametric situation /(•) = 9* the 
estimate 6k is sufficiently close to the "oracle" estimate Ok in the sense that 

(3.4) ¥.eANknhA)Y <a^r 

for all A; = 1, . . . , iiT with r,., is the parametric risk bound from Theorem 2.4: 
¥.e*\NklC{ek,0*)Y fxr. 

Note that the Ok differs from Ok only if a change point is detected at the 
first k steps. The usual condition to any change point detector is that such 
"false alarms" occur with a small probability. Our condition (3.4) has the 
same flavor but it is a bit stronger. Namely, a false alarm at an early stage of 
the procedure is more crucial because it results in selecting an estimate with 
a high variability. Our condition penalizes not only for occurrence of a false 
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alarm but also for the deviation of the selected estimate Ok from the optimal 
estimate Ok- The choice of penalization is motivated by Theorem 2.4. A small 
value of NklC{6k,0k) means that Ok belongs to the confidence set based on 
the estimate Ok, that is, Ok does not differ significantly from Ok- On the 
contrary, big values of NkJC{0k,0k) indicate that 6k differs significantly from 
6k- The choice of the power loss r in the condition (3.4) close to zero leads 
back to counting the numbers of false alarms. Larger values of r result in the 
criteria which also accounts for the deviation of 6k from 6k- We refer to (3.4) 
as a "propagation" condition because it ensures that, under homogeneity 
at every step, the current accepted interval Ik extends to Ik with a high 
probability. 

The values a and r in (3.4) are two global parameters. The role of a 
is similar to the level of the test in the hypothesis testing problem, while 
r describes the power of the loss function. A specific choice is subjective 
and depends on the particular application at hand. Taking a large r and 
small a would result in an increase of the critical values and, therefore, 
improves the performance of the method in the parametric situation at cost 
of some loss of sensitivity to parameter changes. Theorem 4.1 presents some 
upper bounds for the critical values 3^ as functions of a and r in the form 
oologK + 21og(A^fc/Q;) + 2rlog(A''A'/^fc)i with some coefficient ao- We see 
that these bounds linearly depend on r and on loga"^. For our applications 
to volatility estimation, we apply a relatively small value r = 1/2 which 
makes the procedure more stable and robust against outliers. We also apply 
a = 0.2, although the other values in the range [0.1, 1] lead to very similar 
results. 

The set of conditions (3.4) do not directly define the critical values jk- 
We present below one constructive method for selecting 3^ to provide the 
"propagation" conditions (3.4). 

3.3.1. A sequential choice- Here we present a proposal for a sequential 
choice of the g^'s. Consider the situation after the first k steps of the al- 
gorithm. We distinguish between two cases. In the first, change point is 
detected at some step I < k, and in the other case no change point is de- 
tected. In the first case, we denote by Bi the event meaning the rejection at 
the step I, that is, 

Bi = {Ti<u,---,Ti^i<di~i,Ti>ji} 

and Ok = Oi^i on Bi, I = 1, . . . ,k. The sequential choice of the critical values 
3fc is based on the decomposition 

k 

(3.5) \ICiOk,OkW = Y.\IC{6k,6i^iWl{Bi) 

1=1 
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for every k < K. Now, we show that the event Bi only depends on 31, ... ,3^. 
In particular, the event Bi means that Ti > 31 and 6j = Oq, for all j >1. We 
select 3i as the minimal value providing that 

(3.6) max Eg,\NklC{ek,eo)\'l{Ti > < aXr/K. 

k=l,...,K 

Similarly, for every / > 2, select 3/ by considering the event Bi = {k = 1} , 
meaning that the first false alarm occurs at the step / and 9k = ^z-i for all 
k > I. If 3i, . . . ,3i_i have already been fixed, the event Bi is only controlled 
by 3; leading to the following condition on 3^: this is the minimal value that 
ensures 

(3.7) maxEe*\NklC{ek,ei.i)\''l{Bi) = aXr/K. 

k>l 

Such a value 3/ can be found numerically by the Monte Carlo simulations 
from the parametric model ¥0* for any fixed 6* [see Lemma 3.2]. It is 
straightforward to check that such defined 3^ fulfill (3.4) in view of the 
decomposition (3.5). 

3.3.2. Examples of choosing the intervals Z^- To start the procedure, one 
has to specify the set of intervals Xo,Ti, . . . Note, however, that this 
choice is not a part of the LCP procedure. The method applies to whatever 
intervals are selected under condition (MD) (see Section 4). This section 
presents one example which is at the same time the default choice for our 
simulation study and applications. 

The set Nq,Ni, . . . , Nk is defined geometrically by the rule Nj. = [Nqo^] 
for some fixed value A'o and the growth rate a > 1. Such a proposal is moti- 
vated by the condition (MD) from the next section. Note also that the sets 
J7fc do not intersect for different k ,and every point r G [f^ — Nk,t^ — Nq] is 
tested as a possible location of the change point at some of the first k steps 
of the procedure. 

Our numerical results (not reported here) indicate that the procedure is 
quite stable with regard to the choice of the parameters like Nq and a. We 
apply a= 1.25. The other values of a in the range 1.1 to 1.3 lead to very 
similar results. We also apply A'^o = 5, which is motivated by our applications 
to risk management in financial engineering. 

4. Theoretical properties. This section discusses some useful theoretical 
properties of the adaptively selected interval of homogeneity 2 and then of 
the adaptive volatility estimate 9 that corresponds to the selected interval 
2, that is, 6 = 6j-. Our main "oracle" result claims that the final estimate 6 
delivers essentially the same quality of estimation as the estimate with the 
optimal ( "ideal" ) choice of the interval X^.* . It is worth noting that the oracle 
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result does not assume any particular structure of the volatility function 
f{t). It can be an arbitrary, predictable positive random process. Particular 
cases include piecewise constant, smooth transition and other models. As 
shown in Section 2.2, the oracle result automatically ensures the optimal 
estimation rate under usual smoothness conditions on the function /(•). 

The "oracle" result is in its turn, a corollary of two important properties of 
the procedure: "propagation" under homogeneity and "stability." The first 
one means that in the nearly homogeneous situation, the procedure would 
not terminate (no false alarm) with a high probability. In other words, if the 
parametric (constant) approximation well applies in the interval X^, then 
this interval will be accepted with a high probability. The "stability" prop- 
erty ensures that the estimation quality will not essentially deteriorate in 
the steps "after propagation" when the local constant approximation is not 
sufficiently accurate. Typically, the procedure terminates in such situations. 

The results require some regularity conditions on the growth of the inter- 
vals Ik- Namely, we require that the length N^. of Ik grows exponentially 
with k. 

(MD) For some constants Uo,u with < Uq < u < 1, it holds 

uo < Nk^i/Nk < u. 

In addition, we assume that the parameter set Q satisfies the condition 
(0) for some constant a with < a < 1, and, for any Oq, 9 £ Q, 

We start by discussing the behavior of the procedure in the time-homoge- 
neous situation with the constant volatility parameter 9*. In this case the 
properties of the resulting estimate 9 are guaranteed by the condition (3.4). 
Our first result claims a possibility of selecting the critical values 3^ to 
provide (3.4) and states some upper bounds for the ^kS. Similar results can 
be stated in the local parametric situation when the homogeneity condition 
f{t) = 9* is only fulfilled for some time interval /. 

4.1. Behavior under (local) homogeneity. First, we consider the homoge- 
neous situation with the constant parameter value /(x) = 9* . Our first result 
presents an upper bound for the parameters 3^ which ensures condition (3.4). 

Theorem 4.1. Assume (MD). Let f{t) = 9*, for all telx- Then, there 
is a constant > depending on r and Uq, u such that the choice 

(4.1) u = aologK + 2\og{Nk/a)+2r\og{NK/Nk) 

ensures (3.4), for all k< K. 
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Remark 4.1. The present result only describes an upper bound for the 
critical values which will be used for our theoretical study. This upper bound 
is not used for computing the values 3fc in practical applications. However, 
it qualitatively describes how every critical value ik depends on the index k 
and on the parameter r, a. 

Proof of Theorem 4.1. Before proving the result of the theorem, we 
present two useful technical lemmas. The first one shows that the maximum 
test statistic Tj^j is stochastically bounded in a rather strong sense. 

Lemma 4.2. Let J, I he tested and testing intervals, and Tjj be the test 
statistic from (3.1). For any other interval X and any 3 > 1, it holds 

Ee.|iVi/C(6ii,r)ri(T,,j >3) < 2NjCrie~'/^, 

where Cr is the constant from Theorem 2.4. 

Proof. Every t £ J splits the interval / into /' and /". For any interval 
T, by Theorem 2.4, 

Ee*\NjlC{ej,e*)\'-l{Tj^^>i) 

<Ee*\Njic{ej,e*)\'-{i{Nj,,ic{er,,en > i/2) + iiNi,ic{ei,,e*) > i/2)} 

<2a/e-3/2. 
Now, by definition of Tjj, 

l{Tjj>i)<Y,l{Ti^^>i) 

and the assertion follows. □ 

Below, we also utilize the metric-like property of the Kullback-Leibler 
divergence IC{6,6'). 

Lemma 4.3 [Polzehl and Spokoiny (2006), Lemma 5.2]. Under condi- 
tion (Q), it holds that, for every sequence 6q,6i, . . . ,6fn G 0, 

ic'/^{0i,e2) < a{ic'/\ei,eo) + ic'/^ {92,00)}, 

K}'\eo, 9m) < a{IC'/\9o, 9i) + --- + lOl\9m^M]. 

With given constants 3fc, define, for k>l, the random sets 

Ak = {Tk<ik}, A'^^^ =Air\---r\Ak. 

Obviously, 9k = 9k on A'^^\ for all k < K, and we have to bound the risk of 9k 
on the complement A^^'^ of A^. Define Bi = A^''"^^ \A^^\ The event Bi means 
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the false alarm at step / and hence, k = 1 — 1. We aim to bound the portion 
of the risk Eg* |A'^fc/C(0fc, 0^)1'' =Kgt\Nf:IC{6i.,6i)\'^ associated with the false 
alarm Bi, for every I < k. The definition of Bi implies that 
^{^i) < > hi)- Lemma 4.3 and the elementary inequality + < 

2(2r-l)+ ^^2r _^ ^2r) ^-^j^ ^^^^y 6 > Q imply 

Ke^lC'{ek,ei)l{Bi) < 2(2""i)+{Ee./C'^(e~fc,r) +Ke^Viei,0*)}liBi). 

The interval J^i is of length A'^;, and by Lemma 4.2, 

Ee*V'{ek,0i)l{Bi) 

<2(2-i)+a2^'{E,./C'-(4,r)+E,./C'^(^,,r)}l(Tx,^,,^,>3z) 
< 2(2^-1)+ a2^aiVi3[e-'''/2 (AT-'- + iy-'"). 

This and Theorem 2.4 imply, for every k < K, 

k 

Ee.\NklCiek,ekW < NlEe* {hA)l{Bi) 

1=1 

1=1 L V / 

It remains to check using the condition (MD) that the choice ^k = ^ologi^ + 
2 log a^^ + 2r log(A''i^ /^k) + 2 log A^'^, with properly selected oq, provides the 
required bound 'Eg*\Nk}C{9k,9k)\^ < aXr- □ 



4iVz3[e-^'/2. 



4.2. Behavior under "small modeling bias" condition. Now, we extend 
the previous result to the situation when the parametric assumption is not 
precisely fulfilled but the deviation from the parametric structure within 
the considered local model is sufficiently small. At the step k, the procedure 
involves the interval Ik+i used for testing a change point within Jk- There- 
fore, the deviation from the parametric situation can be measured for the 
step k by 

[see (2.4)]. By definition, the modeling bias Ai.{6) increases with k. We 
suppose that there is a number k* , such that Afc(0) is small for some 9 and 
k = k*, and hence also for all k <k* . Consider the corresponding estimate 
9k* obtained after the first k* steps of the algorithm. Theorem 2.5 implies 
in this situation the following result. 
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Theorem 4.4. Assume (MD). Let 9 and k* be such that EAfc.(6') < A, 
for some A > 0. Then, 

V axr J 

Elogj^l + — — ' ^' j < 1 + A. 

4.3. "Stability after propagation" and "oracle" results. Due to the "prop- 
agation" result, the procedure performs well as long as the "small modeling 
bias" condition Afc(0) < A is fulfilled. To establish the accuracy result for 
the final estimate 9, we have to check that the aggregated estimate 9^ does 
not vary much at the steps "after propagation" when the divergence Afe(0) 
from the parametric model becomes large. 

Theorem 4.5. Suppose (MD) and (Q). Let, for some k < K, the in- 
terval 2k be accepted by the procedure and hence, 9^ = 9^. Then, it holds 
that 

(4.2) NklC{9kA+i)<lk. 
Moreover, under (MD), it holds for every k' with k < k' < K 

(4.3) NkJC{9k,9k')<a^cl-u 
with Cu = (u~^/2 - 1)"^ and ik = max/>fc3/. 

Remark 4.2. An interesting feature of this result is that it is fulfilled 
with probability one; that is, the control of stability "works" not only with 
a high probability, but it always applies. This property follows directly from 
the construction of the procedure. 

Proof of Theorem 4.5. lilk+i is rejected, then O^+i = and the 
assertion (4.2) trivially follows. Now, we consider the case when Ik+i is 
accepted yielding 0^ = 9^ and ^fe+i = 0^+1 . The acceptance of 2^ implies, 

dcf 

by definition of the procedure, that = Tx^^-^^j^ < and, in particular, 
7jfc+i,T < 3fc> with T = t^ — Nk being the left edge of J7fc. This yields [see (3.2)] 
that 

Nk}C{6k,9k+i) < Ik 

and the assertion (4.2) is proved. 

Now, assumption (0) and Lemma 4.3 yield 

lO'\9kA') < a E 'C^^H0j,9,+i) < a J] ii,/Nj)'/\ 

j=k j=k 
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The use of assumption (MD) leads to the bound 

k'-i 

j=k 

which proves (4.3). □ 

Combination of the "propagation" and "stabihty" statements imphes the 
main result concerning the properties of the adaptive estimate 9. 

Theorem 4.6 ["Oracle" property]. Let EAk{9) < A, for some 9 £& 
and k<k* . Then, 9 is close to the "oracle" estimate 9^* in the sense that 

Nk*lC{9k*,9)<a^clik*. 

The result claims the "oracle" property of the estimate 9 in the sense that 
it belongs with a high probability to a confidence set of the oracle estimate 
9k* , and thus, there is no significant difference between 9 and 9k* ■ 

We also present one corollary about the risk of adaptive estimation for 
r = 1/2. An extension to an arbitrary r > is straightforward. 

Theorem 4.7. Assume (MD) andEAk*{9) < A, for some k* , 9 and A. 
Then, 

Elog 1 + -^^^^ <log 1 + +A + a + l, 

V ari/2 / V ti/2 J 

where Cu is the constant from Theorem 4-5. 

Proof. By Lemma 4.3, similarly to the proof of Theorem 4.5, 

a-'\Nk*}C{9,9)\'/^ 

<\Nk*IC{9k*,9)\'/'' + \ Nk*IC{9k* , 4- ) + I Nk*IC{9k* ,9)\'/^ 

< CnVJ^+\Nk*IC{9k*,9k*)\'/^ + \Nk*IC{9k*,9)\'/\ 

This, the elementary inequality log(l + a + b) < log(l + a) + log(l + b) for 
all a,b>0, Lemma 2.6, Theorem 2.4 and (3.4) yield 

mogil + iax,/2r'Nll'lC^/\9,9)) 
V ri/2 / 
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1^1/2 ^1/2 / 

<log('i + £ii^')+A + a + l 
V ri/2 / 

as required. □ 

Remark 4.3. Recall that by Theorem 4.4, the "oracle" choice k* leads 
to the risk bound for the loss \Nk*K,{Ok* ■,0*)\^/'^ of the corresponding es- 
timate 9k*. The adaptive choice states a similar bound but for the loss 
\N]f*]C{6 ,6*)\^/'^ ll^J^ . This means that the accuracy of the adaptive esti- 
mate 9 is worse by factor y/Jk* which can be considered the payment for 
adaptation. Due to Theorem 4.1, jfc* is bounded from above by a^logK + 
2\og[Nk* /a) + 2r\og{NK /Nk*)- Therefore, the risk of the adaptive estimate 
corresponds to the best possible risk among the family {0^}, for the choice 
k = k* up to a logarithmic factor in the sample size. Lepski, Mammen and 
Spokoiny (1997) established a similar result in the regression setup for the 
pointwise adaptive Lepski procedure. Combining the result of Theorem 4.7 
with Theorem 2.8 yields the rate of adaptive estimation (n~^ logn)^/'-^''''^^ 
under Lipschitz smoothness of the function / and the usual design regu- 
larity [see Polzehl and Spokoiny (2006) for more details]. It was shown by 
Lepski (1990) that in the problem of pointwise adaptive estimation this rate 
is optimal and cannot be improved by any estimation method. This gives 
an indirect proof of the optimality of our procedure. The factor 3^* in the 
accuracy of estimation cannot be removed or reduced in the rate because 
otherwise the similar improvement would appear in the rate of estimation. 

4.4. Switching regime model. A switching regime model is described by 
a sequence z^i < z/2 < • • • of Markov moments with respect to the filtration 
(J-t) and by values 61,62, ... , where each 6j is J'v^. -measurable. By definition, 
(Tj = f{t) = 6j, for Vj <t < fj+ii and Ut is constant for t < ui. This is an 
important special case of the model (2.1). It is worth mentioning that any 
volatility process at can be approximated by a switching regime model. For 
this special case, the above procedure has a very natural interpretation. 
When estimating at the point , we search for a largest interval of the form 
[t* — m,t*[ which does not contain a change point. More precisely, with a 
giving sequence of interval-candidates Ik = [t* — A^fc,i*[, we are looking for 
the largest homogeneous interval among them. This is done via successive 
testing for a change point within the intervals Jk = — ^k^t^ — 

The construction of the procedure automatically provides the prescribed 
risk level associated with the first kind error (a false alarm). In this section, 
we aim to show that the procedure ensures a near-optimal quality of change 



+ Elog 



22 



V. SPOKOINY 



point detection. The quality (sensitivity) of a change point procedure is 
usually measured by the mean delay between the occurrence of the change 
points and its detection. To study this property of the proposed method, we 
consider the case of estimation at a point next after a change point v. 
The "ideal" choice 1^* among Ti , . . . , Zk is obviously the largest one which 
does not contain u. Theorem 4.7 claims that the procedure accepts with a 
high probability all the intervals Ij. for which the testing interval Tk+i does 
not contain the point of change v. This particularly implies that the quality 
of estimation of Oto by our adaptive procedure is essentially the same as if 
we knew the latest change point v a priori. Now we additionally show that 
the procedure rejects with a high probability the first interval Tk*+i which 
contains the point of change v provided that the magnitude of the change is 
sufficiently large. This fact can be treated as the sensitivity of the procedure 
to the changes of regime. 

In our study, we assume that the changes occur not too often, and there 
is exactly one change within Jk*+i and moreover, within the larger interval 
Ik* +2 which is used as the testing interval for J^k*+i- Let 6' be the value 
of the parameter before the change and 6" after it. The point r splits the 
interval / = 2k*+2 into two homogeneous intervals. f{t) = 9" for t ^ I" = 
[r, while f{t) = 9e within the complementary interval t € I \ I" . Define 
ci = Nk*/Nk*+2, C2 = Nk*+i/Nk*+2- By condition (MD), ci > u§ and C2/C1 > 
The length - r of the interval [r,t*[ fulfills ci < (t* - T)/Nk'+2 < C2- 
Based on these considerations, define the following measure of change from 



The following simple bound can be useful for bounding the distance d? {9', 9") . 

Lemma 4.8. There is a constant b > depending on ci and 02 only such 
that 



Proof. For every fixed 6^', 6*", 6*, the expression {l-c)IC{9',9) + cJC{9",9) 
is a linear function of c. Therefore, its minimum with regard to c is attained 
in one of the edge points ci, C2, and it suffices to check the assertion only for 
these two values of c. Now, the assertion follows directly from the definition 
of the Kullback-Leibler distance IC{9',9) as a smooth function of the ratio 
9'/9 with IC{9,9) = 0. □ 

We aim to show that if the contrast d{9',9") is sufficiently large then the 
test statistic Tk*+i will be large as well, yielding that the interval Ik*+i will 
be rejected with a high probability. 



9' to 9": 



(4.4) 



d^{9',9") = mf inf {{1 - c)IC{9' ,9) + clC{9" ,9)}. 

9 ce[ci,C2] 



d^{9',9") > b{9'/9" -9"/9'f. 
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Theorem 4.9. Let f{t) = 0' before the change point at u and f{t) = 6" 
after it. If, for some 3 > 0, 

(4.5) Nk*+2d\e\e") > 2a2(3fc,+i +3) -m, 

then 

F{Ik*+i is not rejected) < 4e~^. 

Proof. Let v be the location of the change within J^k*+i =^k*+i \ '^k*- 
It suffices to show that, under the conditions of the theorem, the corre- 
sponding test statistic Tj y exceeds with a high probabihty the value 3a;*+i- 
This would ensure that the interval Tk*+i is rejected. The point v splits the 
testing interval I = Ik*+2 into two subintervals /' and and within each 
of the intervals /' and /" the function f{t) is constant. f{t) = 9' for t G I' , 
and f{t) = 9" for t £ I" . Let a value 3 > be fixed. Introduce the event 

A{^) = l{NrJC{9r,9')<i, Nr,JC{9j.,9") 

By Theorem 2.2, 

P(^(3))>l-4e-^ 

We now consider 3 such that (4.5) holds and show that T/^^ > ],k*+i on 
A{^). By definition, it holds on the set ^(3) that Nj'IC{9ji ,9') <3 and 
Nin]C{9r',9")<],. By Lemma 4.3, 

}C^/\9',9j) < a}C^/\9p,9') + a}C^/\9i,,9j) 

<a{i/Nrf/^ + aK}'\9r,~9i)- 

Hence, 

}C{9',9i) < 2a^i/Nr + 2a^lC{9r,9i) 

and 

nOv.h) > {2a^)-^K}/\9',9i)-i/Nr. 

Similarly, 

IC{9i.,9i) > {2a^)-^IC^/^{9",9i)-i/Nin. 
Now, by definition of T/-^^ [see (3.2)] 

T/,^ = NrlC{9r,~9i) + Np.lCih" ,9i) 

> {2a'y'{Ni,IC{9',9i)+Ni,>IC{9",9i)}-i 

= i2a^)-'Nk^+2{cfCi9', 9i) + (1 - c)/C(^", §1)} - 3 
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with c = Np/Nk*+2- This and the definition of d{e',e") [see (4.4)] yields on 



The theorem assertion follows. □ 

The result of Theorem 4.9 delivers some additional information about the 
sensitivity of the proposed procedure to changes in the volatility parameter. 
One possible question is about the minimal delay m* between the change 
point u and the first moment when the procedure starts to detect this 
change. Due to Theorem 4.9, the change will be "detected" with a high 
probability if (4.5) meets. With fixed 9' ^ 6", condition (4.5) is fulfilled if 
m* is larger than a prescribed constant, that is, we need only a finite number 
of observations to detect a change point. In general, m* should be of order 
d~'^{6',9") X \6' — 0"\~'^, if the size of the change becomes smah. All these 
issues are in agreement with the theory of change point detection [see, e.g., 
Pollak (1985), Csorgo and Horvath (1997) and Brodskij and Darkhovskij 
(1993)] and with our numerical results from Section 5. 

5. Simulated results and applications. This section illustrates the per- 
formance of the proposed local change point detection (LCP) procedure by 
means of some simulated and real data sets. We aim to show that the theo- 
retical properties of the method derived in the previous section are confirmed 
by the numerical results. We focus especially on the two main features of the 
method: stability under homogeneity and sensitivity to changes of volatility. 

5.1. Some simulated examples. Three different jump processes are sim- 
ulated, whose relative jump magnitude is 3.00, 2.00 and 1.75, respectively. 
Each process is simulated and estimated 1000 times, and the median and 
the quartiles of the estimates are plotted in Figure 3. We show the results 
for the final estimate 6 and for the length of the selected interval I. One can 
see that, if the size of the change is large enough, the procedure performs as 
if the location of the change were known. As one can expect, the sensitivity 
of the change point detection decreases when the magnitude of the jump 
becomes smaller. However, the accuracy of the volatility estimate remains 
rather good even for small jumps that corresponds to our theoretical results. 

The algorithm proposed in this paper is compared with the LAVE proce- 
dure from Mercurio and Spokoiny (2004) with the optimized tuning parame- 
ters 7 = 0.5, M = 40 and 3 = 2.40. Figure 4 shows the quartiles of estimation 
for the two approaches for the model with the relative jump magnitude 3. 
One can see that the new procedure outperforms LAVE both with respect 




d\e',e")-i. 
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to the variance and to the bias of the estimator, especiahy for the points 
immediately after the changes. 

Our simulation study has been done for the conditional normal model 
(2.1). We mentioned in Section 2.1 that this assumption is questionable as 
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Fig. 3. A process with Gaussian innovations and jumps of different magnitudes. Top 
panel: jump process (thin solid line), pointwise median (solid line) and quartiles (dashed 
lines) for the estimates 8t. Bottom panel: length of the selected interval It (solid line) and 
its quartiles (dashed lines). The results were obtained with parameters r = 0.5 and a — 0.2 
and interval lengths 5,7,10,13,16,20,24,30,38,47,59,73,92 points. 




Fig. 4. Comparison of the proposed estimator with the one from Mercurio and Spokoiny 
(2004) for change point model with 9/0' = 3. Quartiles of 9 for the LCP method (solid 
lines) and for the LAVE method from Mercurio and Spokoiny (2004) (dotted lines) against 
the true volatility (thick line). 
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far as the real financial data is considered. To gain an impression about 
the robustness of the method against violation of the normality assumption, 
we also simulated using i.i.d. innovations from the ts-distribution with five 
degrees of freedom. The results are shown in Figure 5. As one can expect, 
they are slightly worse than in the case of normal innovations, however, the 
procedure continues to work in a quite reasonable way. The sensitivity of the 
procedure remains as good as with normal innovations, but a probability 
to reject a homogeneous interval became larger. This results in a higher 
variability of the estimated volatility. 

5.2. Volatility estimation for different exchange rate data sets. The volatil- 
ity estimation is performed on a set of nine exchange rates, which are avail- 
able from the web page of the US Federal Reserve. The data sets represent 
daily exchange rates of the US Dollar (USD) against the following currencies: 
Austrahan Dohar (AUD), British Pound (GBP) Canadian Dollar (CAD), 
Danish Krone (DKR), Japanese Yen (JPY), Norwegian Krone (NKR), New 
Zealand Dollar (NZD), Swedish Krone (SKR) and Swiss Franc (SFR). The 
period under consideration goes from January 1, 1990 to April 7, 2000. For 
each time series we have 2583 observations. All selected time series display 
excess kurtosis and volatility clustering. 

Figure 6 show the GBP/USD exchange rate returns together with the 
volatility estimated with the default parameters. The results of the estima- 
tion are in accordance with the data, and the procedure seems to recognize 
changes in the underlying volatility process quickly. 

The assumption of local homogeneity leads to the constant forecast of 
the volatility crt+h for a small or moderate time horizon h. This results in 




Fig. 5. Estimation results with respect to jump processes with jumps of different mag- 
nitudes. The results are obtained with tuning parameters r = 0.5 and a = 0.2 and inter- 
val lengths 5,7,10,13,16,20,24,30,38,47,59,73,92 points. The conditional distribution is 
scaled student ts with five degrees of freedom. 
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the following forecast of the conditional variance of the aggregated returns 
Rt+i H + Rt+h- 



LCP 



t,h 



In order to assess the performance of the proposed algorithm, we compare 
its forecasting ability with that of the GARCH(1, 1) model, which repre- 
sents one of the most popular parameterizations of the volatility process of 
financial time series. The GARCH(1, 1) model is described by the following 
equations: 

Rt = atet, (t1 = uj + aRl_i + /^cr^.i , 

a>0, /3>0, a + /3<l, et~AA(0,l) Vt. 

The /i-step ahead variance forecast of the GARCH(1, 1) is given by 

^2,GARCH . TP p2 ;^.2 I i'^ 1 R\h(Ji ;^.2\ 

where a represents the unconditional volatility and E^^ means E(^|^() [see 
Mikosch and Starica (2000a)]. Since the returns are conditionally uncorre- 
lated, the conditional variance of the aggregated returns is given by the sum 
of the conditional variances: 



T/GARCH . 



■.EtiRt+i + --- + Rt+hf 



h 

fe=i 



2 

t+k 



E2,GARCH 
^t+h\t 



k=l 



The assumption of constant parameters for a GARCH(1, 1) model over a 
time interval of the considered length of about 2500 time points can be too 
restrictive. We therefore considered a scrolling estimate, that is, for every 
date, the preceding 1000 observations are used for estimation of the GARCH 
parameters, and then the estimated parameters are used to forecast the 
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Fig. 6. Returns and estimated volatility for the GBP/USD exchange rate. 
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variance at different horizons. This method is nonadaptive in the choice of 
the observation window, but it takes advantage of a more flexible GARCH- 
modeling. The LCP algorithm suggested in this paper applies a very simple 
local constant modeling but benefits from a data-driven choice of the interval 
of homogeneity. 

The quality of forecasting is measured by comparing the forecasts V^^^ , 
respectively, VJ*^^^^^ with the realized volatility 

yt,h ■=Rt+i H — 

We apply the following mean square root error criterion (MSqE) for an 
interval /: 

MSqE, = ^ \V,f^ - Vi,.r/VE l^t!"/^^^ - VtM'^'- 
tei tei 

The MSqE is considered instead of the more common MSE for robustness 
reasons. Actually, in this way outliers are prevented from having a strong 
influence on the results. The MSqE is computed for six nonoverlapping in- 
tervals of 250 observations, and the results are shown in Table 1. One can 
observe that both methods are comparable and that the relative performance 
depends on the particular situation at hand. For periods with stable volatil- 
ity the LCP forecast is clearly better, but for periods with high volatility 
variation the GARCH method is slightly preferable. 

5.3. Analysis of standardized returns. Our model (2.1) assumes the stan- 
dard normal innovations £t- Many empirical researches argued that this as- 
sumption is too strong and often violated [see, e.g., McNeil and Frey (2000)]. 
Here, we briefly discuss this issue by looking at the standardized returns 
S^t = Rt/^t- The first observation is that even after standardization by the 
estimated variance, the density of standardized returns still displays tails 
which are fatter than the normal. We illustrate this effect in Figure 7 where 
the kernel estimate of the density of standardized returns Rt/'Jt is plotted 
against the normal density and the scaled student ts density with five de- 
grees of freedom. One can observe that the i-distribution delivers a much 
better approximation to the empirical density of returns. 

The volatility clustering effect, though, disappears after standardization 
and autocorrelations of squared returns are not significant any more (see 
Figure 8 for the case of GBP/USD returns). The other exchange rate exam- 
ples deliver similar results. A short conclusion of this empirical study is that 
the standardized returns can be treated as i.i.d. random variables with a 
distribution whose tails are fatter than the ones of the normal distribution. 
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Table 1 

Forecasting performance (MSqE) of LC'P relative to GARCH(1, 1) 
on six consecutive time periods of 250 observations each 



CAD 


h = 


1 


0.994 


0.983 


0.833 


0.967 


1.022 


0.998 




h = 


5 


0.941 


0.999 


0.720 


0.994 


1.105 


1.009 




h = 


10 


0.862 


1.038 


0.645 


0.960 


1.149 


0.999 


DKR 


h = 


1 


0.881 


0.924 


0.844 


0.979 


0.976 


1.013 




h = 


5 


0.849 


0.968 


0.802 


1.035 


0.987 


1.007 




h = 


10 


0.870 


0.971 


0.691 


1.053 


0.986 


0.989 


JPY 


h = 


1 


0.931 


0.987 


0.892 


1.004 


1.021 


0.992 




h = 


5 


0.876 


1.006 


0.858 


1.002 


1.032 


0.998 




h = 


10 


0.889 


0.978 


0.828 


1.033 


1.061 


1.001 


AUD 


h = 


1 


0.973 


0.919 


0.895 


1.017 


1.022 


0.993 




h = 


5 


0.966 


0.943 


0.877 


1.012 


0.967 


0.959 




h = 


10 


0.932 


0.958 


0.887 


1.032 


1.023 


0.990 


GBP 


h = 


1 


0.874 


0.969 


0.904 


1.029 


0.947 


0.960 




h = 


5 


0.8 14 


0.960 


0.914 


1.090 


0.941 


0.952 




h = 


10 


0.775 


0.890 


0.884 


1.087 


0.972 


0.949 


NZD 


h = 


1 


0.845 


0.941 


0.928 


1.042 


0.987 


0.700 




h = 


5 


0.816 


0.918 


0.913 


1.065 


1.002 


0.657 




h = 


10 


0.742 


0.984 


0.884 


1.095 


1.013 


0.632 



5.4. Application to value at risk. The Value at Risk (VaR) measures 
the extreme loss of a portfolio over a predetermined holding period with a 
prescribed confidence level 1 — a. This problem can be reduced to computing 
the quantiles of the distribution of aggregated returns [see, e.g., Fan and Gu 
(2003)] for a recent overview of this topic. 




Fig. 7. Kernel density estimate of exchange rate returns JPY/USD, normal density and 
scaled student's ts density with five degrees of freedom. 
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Fig. 8. ACF of the absolute GBP/USD returns (upper plot) and of the standardized 
absolute GBP/USD returns (lower plot). Dotted straight line denotes the 95% significance 
level. 



Our modeling approach can easily be adapted to the VaR problem. Namely, 
one may forecast the 1% and 5% quantile of the next return Rt+i and of 
the aggregated returns Rt+i + • — h Rt+h = log(5j+/i/S'f), for each date t, in 
the following way. The volatility parameter at is estimated from the histor- 
ical data Rs, for s <t, and one can consider different distributions for the 
innovations et- In our study, we compare the Gaussian, the scaled student 
ts-distribution with five degrees of freedom and the empirical distribution 
Ft of the past empirical innovations for s <t, that is, 

Rt+h = cTtit+h with it+h ~ AA(0, 1) or ^Jbjiit+h ~ ^5 or ^t+/, ~ Ft. 

Similar approaches have been applied in McNeil and Frey (2000) with 
the use of the GARCH(1,1) model for estimating the volatility and 
extreme value theory for evaluating the distribution of returns, while 
Eberlein and Prause (2002) assume the Generalized Hyperbolic Distribu- 
tion for the innovations. 

In order to better interpret the results, we notice that the scaled dis- 
tribution has higher 5%-quantiles than the ones of the Gaussian at any of 
the considered horizons and lower 1%-quantiles. Therefore, the Gaussian 
distribution of innovations is more conservative for 5%-quantiles, while the 
opposite is true for 1%-quantiles. 

We apply the procedure to the set of nine exchange rates with about 
2500 observations in each one. The frequency of overshooting the predicted 
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quantile for the given realizations of the returns is given in Table 2. The first 
500 observations in every time series are taken as presample for estimating 
the parameters. Notice, that for the five and ten day horizon, overlapping 
intervals of data are used as in Fan and Gu (2003). 

According to the requirement of the regulators [BIS (1996)], a bank has 
to determine its capital requirements in order to cover from market risk 
proportionally to the 1% quantile of the distribution of the portfolio losses 
over a ten-day horizon. Internal models calculating this quantile are regularly 
monitored. The coefficient of proportionality is set to 3 for models whose 
performance is satisfactory (green zone), and it can be increased up to 4 
by a discretionary judgment of the regulators for models which appear to 
estimate the quantile imprecisely (yellow zone). If the model performance 
is considered very poor, the coefficient is automatically increased to 4 (red 
zone). 

The official criterion for the evaluation of an internal model is the statisti- 
cal significance of the 1% quantile estimates of the portfolio loss distribution 
over a one-day horizon. The prescribed procedure, called backtesting, checks 
whether the observed frequency of days out of the last 250 for which the 
losses were larger than the value computed by the prescribed VaR proce- 
dure does not significantly deviate from the nominal level 0.01 [see Deutsch 
(2001)]. Every procedure is classified as green, yellow and red. The green 
zone means that the empirical frequency is in agreement with the nominal 
probability 0.01. The yellow zone begins at the point, such that the prob- 
ability of exceptions for the tested VaR procedure exceeds the value 0.01 

Table 2 

Percentage of overshooting the prescribed VaR level for six series of exchange rates for 
nominal quantile levels 1 % and 5%, three different distributions of innovations and time 

horizon ft = 1, 5, 10 





h 




Gaussian 






Student 5 






Empirical 




1 


5 


10 


1 


5 


10 


1 


5 


10 


1% quantile 


AUD 


2.0 


1.7 


1.0 


1.6 


1.5 


0.8 


0.6 


1.2 


0.3 




CAD 


1.9 


2.2 


1.9 


1.4 


1.8 


1.6 


0.6 


1.2 


1.0 




DKR 


1.1 


1.3 


1.0 


0.5 


0.9 


0.8 


0.2 


0.6 


0.4 




GBP 


1.6 


1.6 


1.2 


1.2 


1.3 


1.1 


0.3 


0.8 


0.5 




JPY 


0.8 


0.6 


0.5 


0.5 


0.4 


0.4 


0.1 


0.1 


0.1 




NZD 


2.3 


1.7 


1.4 


1.8 


1.4 


1.1 


0.8 


0.9 


0.4 


5% quantile 


AUD 


5.4 


5.5 


4.9 


6.3 


5.6 


5.1 


4.6 


4.3 


5.8 




CAD 


5.5 


6.7 


7.1 


6.2 


6.9 


7.2 


4.5 


5.0 


7.6 




DKR 


5.0 


5.6 


5.4 


6.0 


5.7 


5.7 


4.2 


4.0 


6.2 




GBP 


5.1 


5.7 


5.5 


5.9 


5.8 


5.7 


4.3 


4.1 


6.1 




JPY 


4.0 


3.7 


4.0 


5.0 


3.9 


4.1 


3.4 


2.3 


4.6 




NZD 


5.0 


5.5 


5.4 


5.4 


5.7 


5.6 


4.1 


4.3 


6.1 
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with a 95% confidence interval. One can easily verify that such probability 
corresponds to 5 or more exceptions out of 250 days, that is, the frequency 
of exceptions equals 2%. Similarly, the red zone corresponds to the 99.99% 
level, evidence that the tested procedure does not provide the required prob- 
ability of exceptions. For a sample of 250 observations, this corresponds to 
10 exceptions, or equivalently, 4% frequency of overshooting the VaR value. 

The comparison of these requirements with our results presented in Table 
2 shows that, on average, none of the procedures we tried are in the red 
zone, and that the procedure using empirical distribution function for the 
residuals is always in the green zone. The use of the student distribution 
also allows us to get the green zone results for most of the examples, while 
the procedure with Gaussian innovations is often in the yellow zone. 

We conclude that the use of the distribution for the innovations slightly 
improve the results, and the VaR quality is acceptable for both Gaussian and 
scaled student quantiles, while the application of the empirical distribution 
of the residuals leads to an almost perfect fit of the prescribed quantiles for 
all considered time horizons. 
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